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Abstract 

Several important complex network measures that helped discovering common 
patterns across real-world networks ignore edge weights, an important information 
in real-world networks. We propose a new methodology for generalizing measures 
of unweighted networks through a generalization of the cardinality concept of a 
set of weights. The key observation here is that many measures of unweighted 
networks use the cardinality (the size) of some subset of edges in their compu- 
tation. For example, the node degree is the number of edges incident to a node. 
We define the effective cardinality, a new metric that quantifies how many edges 
are effectively being used, assuming that an edge's weight reflects the amount of 
interaction across that edge. 

We prove that a generalized measure, using our method, reduces to the original 
unweighted measure if there is no disparity between weights, which ensures that 
the laws that govern the original unweighted measure will also govern the gener- 
alized measure when the weights are equal. We also prove that our generalization 
ensures a partial ordering (among sets of weighted edges) that is consistent with 
the original unweighted measure, unlike previously developed generalizations. We 
illustrate the applicability of our method by generalizing four unweighted network 
measures. As a case study, we analyze four real-world weighted networks using 
our generalized degree and clustering coefficient. The analysis shows that the gen- 
eralized degree distribution is consistent with the power-law hypothesis but with 
steeper decline and that there is a common pattern governing the ratio between the 
generalized degree and the traditional degree. The analysis also shows that nodes 
with more uniform weights tend to cluster with nodes that also have more uniform 
weights among themselves. 

1 Introduction 

Mining and analyzing complex networks have received significant attention in recent 
years due to the explosive growth of social networks and the discovery of common pat- 
terns that govern wide-range of real world networks ll22l l4lfTn[8l [191 [TOl 1211. The core 
of mining complex networks is network measures, which are functions that summarize 
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the network structure to simpler numeric values. These measures are generally clas- 
sified into two main classes: measures that ignore edge weights and focus primarily 
on the structure of the graph, which we call unweighted measures, and measures that 
take edge weights into account (in addition to the structure), which we call weighted 
measures. 

Unweighted measures received the bulk of researchers' attention, due to their sim- 
plicity, intuitiveness, and the relative ease of computation. Such an attention resulted 
in several influential findings such as the small world (relied on the clustering coeffi- 
cient) II22J and the power-law (relied on the degree distribution)||4l[T0l. Despite their 
popularity, unweighted measures ignore important network information: the weights. 
Consequently, several measures were developed in order to take weights into account. 
The use of weighted measures, however, is still dwarfed by the use of unweighted 
measures in analyzing complex networks ITTI IT3l [TOl l2Ti . 

The wide spread usage of unweighted network measures motivated the search for 
generalizations of unweighted measures that takes weights into account ||5ll2l l20ll . We 
propose here a new methodology for generalizing measures of unweighted networks 
through a generalization of the cardinality concept of a set of weighted edges. The key 
observation here is that many measures of unweighted networks use the cardinality (the 
size) of some subset of edges in their computation. For example, the node degree is the 
number of edges incident to a node. The clustering coefficient of a node is the ratio be- 
tween the number of edges between its neighbors and the number of all possible edges 
among the neighbors|^ We propose the effective cardinality metric, a novel extension 
of the traditional set cardinality metric that quantifies the number of edges effectively 
being used among a set of weighted edges. By simply replacing the traditional cardi- 
nality with the effective cardinality, one can generalize unweighted network measures 
to take weights into account. The central assumption here is that an edge's weight re- 
flects the amount of interaction across that edge, which is a reasonable assumption in 
many real domains. For example, an edge weight can represent the number of times 
a person calls a friend, the number of packets transmitted on an Internet link, or the 
number of papers co-authored by two scientists. 

UnUke most of the previous work, which focused on generalizing individual mea- 
sures II5I I20II . we provide a generalization methodology that applies to a large number 
of unweighted measures. More importantly, we prove that our generalized measures 
become identical to the original unweighted measures if there is no disparity between 
weights (i.e. all the weights are equal). This property ensures that all the results that 
applies for an unweighted measure directly follows for our generalization of the same 
measure if the weights are equal. We also prove the effective cardinality, the heart 
of our generalization, imposes a partial ordering among sets of weighted edges that is 
consistent with the traditional cardinality. Intuitively this means that the smaller the 
disparity between weights, the closer the generalized measure is to the unweighted 
measure. This point will become clearer in Section |2] where we discuss in detail the 
different properties of the effective cardinality. None of the previous attempts in gen- 
eralizing unweighted measures |l5]|2l|20| provided similar guarantees. 

Because our generalization of unweighted measures takes weights into account and 

' Other examples include heterophilicity and dyadicity. We describe these measures in further detail later. 
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yet upholds properties of the original unweighted measures, our generalization bridges 
the gap between the extensive research made using the unweighted network measures 
and the research on weighted networks. Furthermore, it allows more accurate analysis 
of the networks that were previously analyzed using unweighted network measures. 
For example, it is known that the degree distribution of several real-world networks is 
consistent with the power law hypothesis, while other networks do not ifTOll . But if one 
takes the disparity of weights into account, will the effective degree distribution reveal 
similar observations? 

We illustrate the applicability of our method by generalizing four well-known un- 
weighted measures: the node degree, the clustering coefficient, the dyadicity, and the 
heterophilicity. Furthermore, as a case study, we analyze four real-world, weighted, so- 
cial networks using two of our generalized measures: the C-degree and the C-clustering 
coefficient (the letter C stands for continuous and denotes our generalization of an un- 
weighted measure). The analysis shows that the C-degree distribution is consistent with 
the power-law hypothesis, similar to the traditional degree distribution, but with steeper 
decline (larger exponent). Furthermore, there is a common pattern governing the ratio 
between the C-degree and the traditional degree. The analysis, using the C-clustering 
coefficient, shows that nodes with more uniform weights tend to cluster with nodes 
that also have more uniform weights among themselves. These findings confirm that 
although our generalization methodology takes weights into account, the generalized 
measures still behave in a manner similar to the corresponding unweighted measures, 
but with more information revealed through the incorporation of weights. 

The paper is organized as follows. Section [2]describes our proposed effective car- 
dinality metric and provides proofs to some of its interesting properties. Section |3]il- 
lustrates how our approach can be used in generalizing unweighted network measures, 
with more focus on the degree and the clustering coefficient measures (an analysis of 
four real-world weighted networks is also provided). Section |4] reviews the related 
work. We conclude in Section |5] 

2 The Effective Cardinality 

Let E' = {ei, e„} C iJ be the subset of edges that are used in computing a par- 
ticular network measure, where n is the number of edges in E' and E is the set of all 
network edges. For weighted networks, each edge e <=i E has a corresponding weight 
w{e), where Ve E E : w{e) > e > (this directly follows from our assumption that 
an edge's weight quantifies the amount of interaction over the edge). 

As we briefly mentioned in the previous section, and illustrate in more detail in 
Section [3] many unweighted network measures rely in their computation on the car- 
dinality of some subset of edges, or n = \E'\. For example, for a node i, the degree 
k{i) — \Ei\, where E^ is the set of edges incident to node i. Also for node i, the 

clustering coefficient z{i) = jj-^xw, where E^ is the set of edges between node is 

neighbors and MAX^ is the maximum number of edges that can be between these 
neighbors (i.e. if node i's neighbors formed a clique). 

When weighted networks were analyzed using unweighted network measures, weights 
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came into play through a defined cutoff threshold: an edge is included in the graph if its 
weight is above a threshold, otherwise the edge is excluded 1 9 , 1 2 1 . The computation of 
any unweighted measure then took place naturally. Such an approach, however, did not 
properly handle the disparity of interaction among neighbors, but rather approximated 
a weighted network with an unweighted network. 

The main limitation of the traditional cardinality function (and consequently all 
the unweighted network measures that use it) is that it ignores edge weights. In other 
words, the cardinality implicitly assumes uniform weight over the edges, which can 
result in giving an incorrect perception of the effective use of edges. For example, a 
person may have 10 or more acquaintances but mainly interacts with only two of them 
(friends). Should that person be considered 2 times more connected than a person with 
only 5 acquaintances but also interacting primarily with two of them? 

For concreteness, let us consider a specific numeric example. Suppose there are 
four sets of edges with corresponding sets of weights Wi — {5, 5, 5, 5}, W2 — {9, 5, 5, 1}, 
W3 = {9,8,2, 1} and W4 = {20,0,0,0}. The cardinalities of these weight sets are 
all the same and equals 4. Intuitively, however, if the weights reflect the interaction 
over edges, then not all the edges are being used equally and the traditional cardinaUty 
becomes a crude approximation. Instead, we want a function that summarizes a set 
of weighted edges into a single real number and has two properties. If edge weights 
are equal, then function we are looking for should assign a value equal to the cardi- 
nality. When the weights are not equal, the function should assign a value between 1 
and the cardinality (maximum) such that the more equal the weights are, the higher the 
function. The first property ensures that the important results and the intuitiveness ob- 
tained through the use of the traditional cardinality in unweighted measures carry over 
to the generalized cardinality and the corresponding generalized measure. The second 
property insures that the generalized measure offer more valuable information than the 
traditional cardinality. Using the above example, we are looking for a function that 
assigns 4 to Wi, 1 to W4, and values between 1 and 4 for W2 and W3, with the value 
assigned to W2 greater than the value assigned to W3 (because the two inner weights 
are equal in case of ^2)- 

Such a generalization of the cardinality measure will allow straightforward gener- 
alization of many unweighted network measures (by simply substituting the traditional 
cardinality with the generalized cardinality function). Simple functions for summa- 
rizing sets (such as the average, the variance, and the summation) can be very useful 
in summarizing weights, but they do not satisfy the two desired properties mentioned 
above. The heart of our generalization is a novel definition of the cardinality of a set 
of edges E' that takes weights into account, which we call the effective cardinality, or 

if E' is empty 

otherwise 

Intuitively, the quantity ^ "'''fL(o) represents the probability of an interaction over 
an edge e among all the edges in E'. The set jjfL(o) : e G S'} is a probabil- 
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ity distribution and the quantity H{E') = Y.eeE' [e^JW ^°g2 ^^if^J is 
the entropy of this probabiHty distribution, which measures the disparity between the 
weights: the more uniform the weights are, the higher the entropy and vice versaj^The 
purpose of the power 2 is to convert the entropy back to the number of edges that are 
effectively being used. 

Before discussing the important properties of the effective cardinality, let us first 
consider few numeric examples that illustrate the intuition behind the effective cardi- 
nality. Consider the set of weights W — {10, 0.01}. The traditional cardinality of this 
set is 2. However, if weights quantify the amount of interaction over edges, then the 
edge with weight 10 is significantly more important than the other edge in this set and 
the cardinality should be closer to 1 than 2. The effective cardinality, as will be clear 
from Lemma |2] captures this by returning the number of edges of equal weights that 
has the same effective cardinality. For W = {10,0.01}, c({10,0.01}) = 1.008, so 
even though the set W has two edges, the effective cardinality is equivalent to only 
1.008 edges with uniform weights. 

For the numeric example we have mentioned earlier, c{Wi = {5, 5, 5, 5}) = 
4, 0(1^2 = {9,5,5,1}) = 3.3276, c{Ws. = {9,8,2,1}) = 3.0219 and W4 = 
{20, 0, 0, 0} = 1, which satisfy the intuitive ordering we described earlier in this sec- 
tion. This ordering is not just by chance or due to a special case, but is actually guar- 
anteed by our proposed effective cardinality. The effective cardinality satisfies three 
intuitive properties (proofs are given shortly after): 

1. Preserving maximum cardinality: \/E' : c{E') < \E'\. Furthermore, c{E') = 
\E'\ iff Ve e E' : w{e) — C, where C is some constant. In other words, the 
effective cardinality is maximum and equals the original cardinality when there 
is no disparity between weights. 

2. Preserving minimum cardinality: c{E') — iff E' is an empty set. Further- 
more, c{E') = 1 iff 3u E E' : w{u) > and Vu u : w{v) — 0. In other 
words, the effective cardinality is one when all edges, except one edge, have zero 
weights. 

3. Consistent partial order over weighted sets: any function that maps a set of 
real numbers (weights) to a single real number imposes an implicit partial order. 
The effective cardinality imposes, arguably, the simplest partial order that is con- 
sistent with the above two properties. If the two sets of weighted edges have the 
same size, the same summation of weights, and their individual weights are the 
same except for two edges, then the set with more uniform weights has higher 
effective cardinality. A formal definition of this property is given in Lemma |4j 

The intuition of the three properties can be clarified through the numeric example 
mentioned earlier. The three properties require the effective cardinality measure to 
impose the following ordering: |W^i| = c{Wi) > c{W2) > 0(^3) > c^W^) = 1. We 
prove each of these properties in the remainder of this section. Note that the ensemble 
approach ^ (which is described in more detail in Section|4]l made no guarantees with 

^Note that the quantity x log2 - — > as x ^ or a; = 1. 
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regard to the partial order over the set of weighted edges. For example, the two set 
of weights W2 — {9, 5, 5, 1} and W3 — {9, 8, 2, 1} will have the same generalized 
degree under the ensemble method. Using the effective cardinality, the generalized 



degree (described in detail in Section 3.1 1 of W2 is strictly higher than the generalized 
degree of W3. 

Theorem 1 The effective cardinality satisfies the three properties described above: the 
maximum cardinality, the minimum cardinality, and the consistent partial ordering. 

Proof The proof follows from the following three lemmas. 

Lemma 2 The effective cardinality satisfies the maximum cardinality property. 

Proof When all the weights are equal to a constant C we have 

■EoeE'^io) cm \E'\ 

We then have 

c{E') = 2^"'^''' h^'°e2(l-E'|) 
^ 2'°S2(|S'|) 
= \E'\ 

In other words, both the cardinality and the effective cardinality of a weighted set 
of edges become equivalent when the weights are uniform. The effective cardinality is 
also maximum in this case, because the exponent is the entropy of the weight probabil- 
ity distribution, which is maximum when weights are uniform over edges. 

Lemma 3 The effective cardinality satisfies the minimum cardinality property. 

Proof When the set of edges is empty, then the effective cardinality is zero by defini- 
tion. When all weights are zero except only one weight that is greater than zero, then 
weight probability distribution is deterministic and the entropy is zero, therefore the 
effective cardinality will be 1 . 

Lemma 4 The effective cardinality satisfies the consistent partial order property. 

Proof Let E[ and £'2 be two (edge) sets such that \E[ \ — |i?2 1 (both have the same car- 
dinality). Let Wi and W2 be the corresponding sets of weights, where J^eieE' ^(el) = 
Se2e-E' ^(e2) — S (the total weights are equal). Furthermore, let |iyinT^2| = 
n — 2, {wii, W12} — Wi — W2, {^21,^22} = W^2 ~ Wi, where the '— ' operator is the 
"set difference" operator (the two sets share the same weights except for two elements 
in each set), and \wii — ?i'i2| < 1^21 — ^22! (the weights of Wi are more uniform 
than the weights of VF2). To prove that the effective cardinality satisfies the consistent 
partial ordering property, we need to prove that c{E[) > c(i?2)- 
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therefore 



Without loss of generahty, we can assume that wn > wi2 and ^021 > W22, there- 
fore wii — ■W12 < W21 — W22- We then have 

wii + W12 = S — ^ w = W21 + W22 
wi£Wir\W2 

or 

wii + W12 _ w _ W21 + W22 _ 

s ^ s 

weWi(\W2 
W21 wii L wii W21 

where ^ = L- ^ md ^ = L- Then from Lemmagwe have h{L, ^) > 
h{L, or 

Wll Wii Wii Wii 

Therefore H{E[) > H{E'2), because the rest of the entropy terms (corresponding to 
Wi Pi 14^2) are equal, and consequently c{E[) > c(i?2). 



Lemma 5 The quantity h(C, x) = —x \g{x) — {C — x) lg(C — x) is symmetric around 

c_ 
2 

Proof 



and maximized at x — ^ for C > x > 0. 



Therefore h{C, x) is symmetric around c/2. Furthermore, h{C, x) is maximized when 

dh{C,x) 
dx 

or 



0= -l-lg2; + l + lg(C-a;) 



Therefore h{C, x) is maximized atx^C — a;=^. 



Ig a; = lg(C - a;) 

c 
2 ■ 



3 Generalizing Unweighted Network Measures Using 
Effective Cardinality 

In principal, any unweighted network measure which uses the cardinality of some sub- 
set of edges can be generalized using the effective cardinality. In fact, while we limited 
the discussion so far to sets of weighted edges, all the proofs in Section |2] applies to 
any sets of weights, even if elements in the sets represent subgraphs, not edges. So for 
example, if we are interested in counting triangles of three connected vertices (which 
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are used in some definitions of the clustering coefficient), we can use the effective 
cardinality to replace the discrete count with a continuous spectrum. 

We here present four example generalizations of unweighted network measures: the 
degree, the clustering coefficient, the dyadicity, and the heterophilicity. The resulting 
generalized measures inherit the three properties of the effective cardinality. Table [T] 
summarize these generalizations. 



Measure 


Unweighted 


Generalized 


Degree of node i 






Clustering coefficient of node i 


MAX^ 


MAX^ 


Dyadicity of a graph 




c(E„ithir.) 

n^ithin 


Heterophilicity of a graph 







Table 1 : The summary of the generalization of four unweighted measures, where Ei is the set 
of edges incident to node i, is the set of edges between neighbors of node i, E^ithm is the 
set of edges within a class of nodes, and Eacross is the set of edges across two classes of nodes. 

The dyadicity and heterophilicity were recently used to study the correlation be- 
tween the types of nodes (node classes) and the network structure IqJ\ . The dyadicity 
of a graph equals , where Ewuhin is the set of edges within a set of nodes of 

the same type (a class of nodes) and n^nhin is the expected number of edges within 
the same class of nodes if there was no correlation between the node class and the net- 
work structure. Intuitively, the dyadicity quantifies the strength of connections between 
nodes of the same type and whether it is above averager] The heterophilicity of a graph 

\E \ 

equals Ls^ehehI^ where Eacross is the set of edges across two classes of nodes and 
nacross is the expected number of edges across the two classes if there was no corre- 
lation between the node class and the network structure. The heterophilicity quantifies 
the strength of connections across two classes (communities) of nodes and whether it 
is above average. The dyadicity can be generalized, using the effective cardinality, to 
be ^(-^"'"""^ and similarly the heterophilicity can be generalized to be '^(-^'"^''°°°) 

''^ within f^acros s 

The degree and the clustering coefficient, of a particular node, are two of the most 
widely used unweighted measures, so the remainder of this section focuses on their 
generalization (using the effective cardinality) and illustrates their use in analyzing 
four real-world, weighted networks. 

3.1 Generalizing the Degree 

As mentioned earlier, the degree is a key measurement that has been used extensively 
in analyzing networks. A node's degree is the number of edges incident to the node, 
or \Ei\, where Ei is the set of edges incident to node i. The degree distribution is 
a common method for summarizing the degrees of all network nodes into one mea- 
sure that characterizes the network. The degree measure and its distribution were used 

^ There are other network measures that also quantified the strength of connections within a class (com- 
munity) of nodes, such as the modularity measure 1171 . 
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(a) network of four nodes, where k is the out-degree of a node 
and r is the continuous out-degree of a node. 



P(k)t 
2 



1 



(b)The degree distribution 
of the network in (a) 



P(r)t 



1 1.38 2 



(c)The continuous degree 
distribution of the network in (a) 



Figure 1 : Example weighted network of four nodes, comparing the (discrete) degree against 
the C-degree. The degree distribution illustrates the benefit of taking weights into account in 
distinguishing nodes. 



extensively in analyzing networks and helped discovering common patterns, particu- 
larly the power law PHTTllTl fTOlfTSjl . A degree distribution follows the power law if 
P{k) cx fc^", where k is the degree, a is a constant, and P{k) is the degree distribution. 

Using our definition of effective cardinality, a generalization of the degree mea- 
sure, which we call the continuous degree or the C-degree, is given by the following 
equation: 

Definition 6 The C-degree of a node i in a network is r{i), where 

{0 ifi is disconnected 

2[l^.eB, .(.) i°g2 „(e,j otherwise 

Where Ei is the set of edges incident to node i and s{i) is the strength of node i. 
Figure[T]compares the continuous degree distribution to the (discrete) degree distribu- 
tion in a simple weighted network of four nodes. A node on the boundary has an out 
degree of 1, while an internal node has an out degree of 2. Intuitively, however, only one 
of the internal nodes is fully utilizing its degree of 2 (the one to the left), while the other 
node (to the right) is mostly using one neighbor only. The C-degree measure captures 
this and shows that the internal node to the left has a C-degree of c({0.5, 0.5}) ~ 2 
while the other internal node has a C-degree of c({0.9, 0.1}) = 2^(0-9'° i) = 1.38. 

The C-degree inherits the three properties we described earlier with respect to the 
traditional node degree. The C-degree of a node is maximum and equals the traditional 
discrete degree when all the weights incident to the node are equal. The C-degree of 
a connected node is minimum and equals one if all edges incident to the node have 
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zero weights except one edge that has a weight greater than zero. And finally, ev- 
erything else being equal, a node with more uniform weights incident to it has higher 
C-degree than a node with less uniform weights incident to it. As mentioned earlier, the 
three properties ensure that the four sets of weights VF(wl) = {5, 5, 5, 5}, W{v2) = 
{9, 5, 5, 1}, W{v3) = {9, 8, 2, 1} and W{vA) = {20, 0, 0, 0} will have corresponding 
C-degree respecting the following inequality k{vl) = r{vl) > r{v2) > r{v3) > 
r(i>4) = 1. 

We have analyzed four real world weighted network^ that capture coauthorships 
between scientists. Three of which were extracted from preprints on the E-Print Archive 
fl8 |: condensed matter (an updated version of the original dataset that includes data 
between Jan 1, 1995 and March 31, 2005), astrophysics, and high-energy theory. The 
fourth network represents coauthorship of scientists in network theory and experiment 
lfT9l . The weight between two scientists i and j reflects the strength of their collabora- 
tion and is given by the equation Wij = „' li , where = 1 if scientist i was a 
co-author of paper m and is the number of co-authors for paper m| 15 1. 

Figure |2] displays the C-degree distribution (CDD) and the (discrete) degree distri- 
bution (DD) for the four collaboration network. The figure uses log-log scale with the 
power law fit based on |10|j^ Interestingly, the CDD follows a pattern similar to the 
DD, despite taking weights into account. However, the power-law fit for the CDD has 
steeper decline (higher a) than the DD. 

One would expect that as the degree of a node increases, the node will interact 
primarily with a smaller subset of neighbors, particularly in social networks where 
humans have limited communication capacity. To verify this intuition, we define the 
degree utilization metric as the ratio between the C-degree and the degree of a node: 
u{v) = ^1^. The degree utilization metric captures the percentage of links that a node 
uses effectively, therefore we expect the degree utilization to decrease as the degree 
increases. Figure [3]plots the degree utilization against the (discrete) degree for the four 
collaboration networks. A common pattern emerges in the four networks. For low de- 
grees, the degree utilization is relatively high (a node with few links makes the best of 
them). For node degree greater than some constant the bias towards high degree uti- 
lization disappears. However, and to our surprise, a cone is observed, which starts wide 
at low degrees and gets narrower as the degree increases (the average degree utilization 
is plotted as a line in the figure). In other words, for degrees above some threshold, 
nodes varies in their utilization of their available links. However, this variation reduces 
as the degree increases, while the mean remains relatively stable. 



3.2 Generalizing the Clustering Coefficient 



As mentioned earlier, the clustering coefficient is a measure that quantifies the clus- 
tering or connectivity among a node's neighbors. When averaged over all nodes, the 
clustering coefficient represent the connectivity of the whole network. The clustering 
coefficient is an important property for identifying small world networks [22] and is 



Available through http://w ww-personal.umich.edu/~mein/netdata/ 



^Source code adopted liom http://www.santate.edu/aaronc/powerlaws/ 
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degree degree 

(a) condensed matter (b) astrophysics 




degree degree 



(c) network theory (d) high-energy theory 



Figure 2: Comparing the discrete degree distribution (DD) with the continuous degree distribu- 
tion (CDD) for the four collaboration networks. The power law fit (PL fit) is also shown with the 
associated power. 



given by the equation jj^'jy w . where is the set of edges between node i's neighbors 

and MAXj^ is the maximum number of edges that can be between these neig hbors0 
The generalized clustering coefficient of a node i using the effective cardinality is: 



o{^) 



MAXf' 

Figure|4]provides a simple motivating example of 3-nodes. The C-clustering coeffi- 



*Note that, particularly for directed graphs, some reseai'chers argued that a clustering signature would 
be more suitable in distinguishing networks [I]. In a clustering signature, 7 types of directed triangles are 
counted separately. The effective cardinality can still be used to replace the discrete counts of these triangles, 
but for the purpose of this paper we focus on the simpler, more widely used definition of the clustering 
coefficient. 
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(a) condensed matter 
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(d) high-energy theory 



Figure 3: Scatter plot of a node degree against its degree utilization for the four collaboration 
networks, the average utilization per degree is also plotted. 

cient can help in distinguishing different nodes that are deemed indistinguishable using 
the traditional clustering coefficient. For example, both nodes A and B have a cluster- 
ing coefficient of 2/2 = 1 (neighboring nodes have two edges between them, out of two 
possible edges). However, B's C-clustering coefficient is o{B) — c({5, l})/2 = 0.78, 
while the C-clustering coefficient of A is o{A) = c({5, 5})/2 = 1. 

Figure |5] shows the scatter plot of the (discrete) clustering coefficient versus the 
(discrete) degree (shown in log scale) for the four collaboration networks. The main 
observation clear from the graph is that in general, the clustering coefficient decreases 
with the increase of the degree. 

Figure [6] shows the scatter plot of the C-clustering coefficient versus the C-degree 
for the four collaboration networks. The continuous version of the scatter plot follows 
the general observation in the discrete case: the clustering coefficient decreases with 
the increase of the degree. Nevertheless, the scatter plot for the continuous measures 
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(a) network of three nodes. The numbers inside each node 
represent the clustering coefficient (top) and the C-clustering 
coefficient (bottom). 



Clustering 
coefficient 



degree 



C-clustering 
coefficient 



A G 



C-degree 



(b) Scatter plot of the degree 
against the clustering coefficient 
for the network in (a). Notice that 
all three nodes have the same 
degree and clustering coefficient 



(b) Scatter plot of the C-degree 
against the C-clustering coefficient 
for the network in (a). The three 
nodes are nicely separated in plot, 
because of taking weights into 
account 



Figure 4: Example weighted network of three nodes, comparing the (discrete) clustering coef- 
ficient against the C-clustering coefficient. The scatter plot of the degree against the clustering 
coefficient illustrates the benefit of taking weights into account in distinguishing nodes. 



covers more area, because both the C-degree and the C-clustering coefficient produce 
continuous spectrum of values (unlike the discrete degree and the discrete clustering 
coefficient). More importantly, one can observe an interesting pattern in the continu- 
ous scatter plot: nodes with high C-clustering coefficient (above 0.8) tend to have more 
discrete C-degree. This is clear from the concentration of points with high C-clustering 
coefficient around the discrete degrees, which is not apparent in points with low clus- 
tering coefficient. Using Lemma |2] this observation means that nodes with incident 
weights that are more uniform (hence the more discrete degree) tend to cluster with 
nodes that have more uniform weights among themselves (hence the higher clustering 
coefficient) 



4 Related Work 

In general, one can classify weighted network measures into two classes: measures that 
generalize unweighted network measures to take weights into account, and measures 
that have no connection to unweighted measures. Surveying all weighted measures that 
have no connection to unweighted measures is beyond the scope of this paper and have 
little relevance to the contribution of this paper, which generalizes unweighted network 
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Figure 5: Scatter plot of a node's discrete degree against its discrete clustering coefficient for 
the four collaboration networks. 



measures. For completeness, we provide here a sample of these measures that are 
related to some unweighted measure (interested reader may refer to survey papers on 
the subject lfT6l l6ll7l). The strength of a node is the summation of all weights incident 
to a node. The strength becomes identical to the node's degree in the very special 
case when all the weights are equal to 1, but it has very weak partial ordering among 
nodes. For example, all the nodes in Figure[T]have the same strength of 1. The weight 
distribution is similar to the degree distribution except that it measures the frequency 
of a particular edge weight. A more recent work [14 1 analyzed a graph's total weight, 
J2eeE ^(s)' against the graph's total number of edges, \E\, over time. That work also 
analyzed the degree of a node, k{v), against the node's strength, s{v). While useful, the 
above measures neither captured the disparity of weights among edges nor provided a 
methodology for generalizing unweighted measures. 
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Figure 6: Scatter plot of a node's continuous degree against its continuous clustering coefficient 
for the four collaboration networks. 



The network tneasure Y{v) = Ee...(„) (^)' successfully captured the dis- 
parity of interaction within a node v f3\. However, unlike our generalization, the Y 
measure is not a generalization of the degree measure as it fails to satisfy the first two 
properties in Section |2] (if the weights are equal, the Y measure of a node does not 
become equal to the node's degree). Furthermore, no guarantee over the partial order- 
ing imposed by the measure was provided, unlike our methodology which provided a 
guarantee on the partial ordering imposed by our generalization. 

Unlike our methodology, which can be used to generalize several unweighted mea- 
sures, there have been several attempts to generalize specific unweighted measures. 
The weighted clustering coefficient Q was an attempt to generalize the clustering co- 
efficient. The generalization relied on an alternative definition of the clustering coef- 
ficient that used triplets [22J. A triplet connected to a node is a subgraph containing 
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the original node in addition to two other connected neighbors. The intuition behind 
the weighted clustering coefficient for node i is to weigh every edge between two of its 
neighbors, j and k, using the weights on edges and {i, k). However, unlike our 
generalization, the weight on edge j, k was ignored. 

A recent attempt to generalize the clustering coefficient used the ratio between the 
total value of closed triplets and the total value of all triplets |20|. The authors proposed 
four functions to evaluate (summarize) weighted triplets: the arithmetic mean, the ge- 
ometric mean, the minimum, and the maximum. Unfortunately, all the four proposed 
functions (and therefore the proposed generalization) have very poor distinguishing 
powers, in addition to the very week connection to the original clustering coefficient. 
For example, using the proposed generalization in |20| (and any of the four proposed 
functions) all the nodes in Figure |4] will have a generalized clustering coefficient of 
1, a limitation that was previously reported |20|. On the other hand, our proposed 
generalized clustering coefficient successfully distinguishes all the three nodes. 

Perhaps the most related work to our contribution is the ensemble approach, which 
provides a methodology for generalizing almost all unweighted network measures IS). 
The first step of the method was to normalize edge weights to ensure all weights are 
between and 1 (more restrictive than our approach, which only assumes weights are 
non-negative). The next step was to randomly generate an ensemble of unweighted 
networks from the original weighted network, where the weight of an edge represented 
the probability of generating the edge. The final step was to compute the generalized 
unweighted measure as the average of the unweighted measure for each network in 
the ensemble. Despite its relative simplicity, and the ability to generalize almost all 
unweighted measures, the ensemble method suffers from several limitations. Unlike 
our generalization, the ensemble method can only generalize unweighted measures for 
the whole network, not for individual nodes. So, for example, it can not be used to 
generate scatter plots similar to those in Figure |6] where points represent individual 
nodes. Another limitation is the need to generate large number of networks in the 
ensemble in order to provide more accurate generalized measures (in order to sample 
edges with very small weights). Furthermore, the ensemble method does not provide 
any partial ordering guarantee. For example, suppose two nodes have the following 
sets of incident weights A = {9, 8, 2, 1} and B — {9, 5, 5, 1}. Under the ensemble 
approach, both nodes will have the same generalized degree (the exact value of the 
generalized degree depends on the weights of other edges in the network, which affect 
the normalization of edges). Using our proposed generalization, node B's degree is 
guaranteed to be greater than node ^'s C-degree by Lemma|4] 

5 Conclusion 

We proposed a new methodology for generalizing measures of unweighted networks. 
The heart of our generalization is the effective cardinality, a novel extension to the 
traditional set cardinality to take weights into account. We illustrated the applicabil- 
ity of our method by generalizing four unweighted network measures: the node de- 
gree, the clustering coefficient, the dyadicity, and the heterophilicity. Furthermore, we 
compared the generalized degree to the traditional degree using four real world net- 
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works and showed that the generalized degree distribution follows a similar pattern 
to the traditional degree distribution, but with steeper decline (larger exponent of the 
power-law fit). We also investigated the ratio between the generalized degree and the 
traditional degree and showed that on average the ratio is bounded, even for nodes with 
high-degree. The analysis of the generalized clustering co-efficient revealed that nodes 
with more uniform incident weights tend to cluster with nodes that have more uniform 
weights among themselves. 
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